Frozen Sentences Of Portuguese: Formal Descriptions For NLP
نویسندگان
چکیده
This paper presents on-going research on the building of an electronic dictionary of frozen sentences of European Portuguese. It will focus on the problems arising from the description of their formal variation in view of natural language processing
منابع مشابه
A Thematic Connectionist Approach to Portuguese Language Processing
In the symbolic approach to Natural Language Processing (NLP), a system can only parse grammatically well constructed sentences. Within such a context, several linguistic phenomena, e.g. the thematic pattern relationships between the sentence constituents, can be accounted for (these pattern relationships are explained by a rule-based linguistic theory called thematic theory [1]). An alternativ...
متن کاملExperiments in identifying frozen sentences
This paper describes an experiment on the identification of frozen sentences (or verbal idioms) from European Portuguese on large corpus of journalistic text. It aims at identifying the main difficulties (or shortcomings) resulting from the intersection of linguistic information encoded in the lexicongrammar with finite-state transducers that are then applied to texts. The paper shows that, for...
متن کاملSome Experiments on Clustering Similar Sentences of Texts in Portuguese
Identifying similar text passages plays an important role in many applications in NLP, such as paraphrase generation, automatic summarization, etc. This paper presents some experiments on detecting and clustering similar sentences of texts in Brazilian Portuguese. We propose an evalution framework based on an incremental and unsupervised clustering method which is combined with statistical simi...
متن کاملAutomatic Alignment of Common Information in Comparable Sentences of Portuguese
The ability to recognize distinct word sequences which refer to the same meaning is of extreme relevance for many applications in NLP, such as automatic summarization, question answering, generation, etc. In this paper we describe our first attempt at aligning common information between portuguese similar sentences. We propose a method based on lexical and syntatic information and some paraphra...
متن کاملExtraction of Definitions in Portuguese: An Imbalanced Data Set Problem
Definition extraction is an important task in NLP and IR fields in the context of e.g. question answering, ontology learning, dictionary and glossary construction. When addressed with learning algorithms, it turns out to be a challenging task due to the structure of the data set, the reason being that the definition-bearing sentences are much fewer than the sentences that are non definitions. I...
متن کامل